batch18
QC REPORT
Input files downloaded from:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/mbrave_batch_data/batch18/
Output files are saved to:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/
The consensus network .tsv file exists: TRUE
The fasta file exists: TRUE
The stample statistics file exists: TRUE
The negative control statistics file exists: TRUE
The positive control statistics file exists: TRUE
Total number of positive controls: 96
Number of positive controls per plate: 1
All plates have positive controls: TRUE
Total number of reads in positive controls: 29562
Maximum number of reads: 466 in positive control sample: CONTROL_POS_BAYS_017_G12
Minimum number of reads: 3 in CONTROL_POS_FACE_201_G12
Average number of positive control reads: 307.9375
Median number of positive control reads: 317
Read standard deviation: 85.9480641832154
Quantiles:
5%: 128.5
10%: 191.5
25%: 279
50%: 317
75%: 354.25
95%: 436
100%: 466
Blue solid line: read mean
Orange dotted lines: 5% and 10% lower
quantiles
Number of positive control samples in the lower 5% quantile: 5
CONTROL_POS_FACE_201_G12
CONTROL_POS_FACE_202_G12
CONTROL_POS_FACE_205_G12
CONTROL_POS_FACE_207_G12
CONTROL_POS_FACE_208_G12
Names of the associated partners: BIFOR
Total number of negative controls: 621
Total number of lysate negative controls: 525
Total number of empty negative controls: 96
Number of negative controls per plate:
| Number of negative controls per plate | Number of plates |
|---|---|
| 2 | 87 |
| more than 2 | 9 |
All plates have negative controls: TRUE
Total number of reads in lysate negative controls: 1629
Total number of reads in empty negative controls: 86
Maximum number of reads: 292 in lysate negative control sample: CONTROL_NEG_LYSATE_FACE_203_H12
Maximum number of reads: 5 in empty negative control sample: CONTROL_NEG_BAYS_019_A2
Zero reads in: 424 negative control samples
In lysate controls: 381
In empty controls: 43
Average number of negative control reads: 2.76167471819646
In lysate controls: 3.10285714285714
In empty controls: 0.895833333333333
Median number of negative control reads: 0
In lysate controls: 0
In empty controls: 1
Skewness number of negative control reads: 11.0469717733312
In lysate controls: 10.1363579769267
In empty controls: 1.30385483117931
Quantiles in lysate controls:
5%: 0
10%: 0
25%: 0
50%: 0
75%: 1
95%: 3
98%: 15.5599999999999
Quantiles in empty controls:
5%: 0
10%: 0
25%: 0
50%: 1
75%: 1
95%: 3
98%: 3.09999999999999
Blue solid line: read mean
Orange dotted lines: upper 5% and 2% of
samples with the highers number of reads
Number of negative control samples in the higher 5%: 39
Out of in the lysate controls: 30
Out of in the empty controls: 9
Number of negative control samples in the higher 2%: 13
Out of in the lysate controls: 13
Out of in the empty controls: 0
Names of the associated partners: BAYS, CAMP, BIFOR, LFLA, RRHP, WTPV
Number of samples in the batch (exclusing controls): 8499
Total number of partner plates: 96
Total number of sample reads: 2832708
Maximum number of sample reads: 906 in sample: LFLA_010_A2
Minimum number of sample reads: 0 in 310 samples
which is 3.64748793975762 % of all samples
Average number of reads: 333.298976350159
Median number of reads: 360
Read standard deviation: 200.8572183961
Skewness number of sample reads: -0.170122229845947
Quantiles:
5%: 1
10%: 18
25%: 166.5
50%: 360
75%: 489
95%: 636
100%: 906
Blue solid line: read mean
Orange dotted lines: lower 5% and 10% of
samples
Number of samples in the lower 10%: 851 out of 8499 samples
Number of samples in the lower 5%: 461 out of 8499 samples
Partners associated with the bottom 5% of samples by read count:
| Partner names | Frequency |
|---|---|
| RRHP | 176 |
| BIFOR | 127 |
| LFLA | 70 |
| BAYS | 57 |
| WTPV | 23 |
| CAMP | 8 |
Number of samples with 0 reads: 310
Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):
FACE_006
FACE_010
FACE_011
FACE_013
FACE_017
FACE_020
FACE_201
FACE_202
FACE_203
FACE_204
FACE_205
FACE_206
FACE_207
FACE_208
RRHP_026
RRHP_027
RRHP_061
WTPV_014
which constitutes 18.75 % of all partner plates in this batch
Grey line: median
Brown line: mean
Green data points:
positive controls
Blue data points: empty negative controls
Navy data points: lysate negative controls
Plates where the 75th percentile of the data is lower than expected mean read count (dark grey): 4, 7, 8
How many samples from the low-performance partner plates are present in the low-performance UMI plates (purple data points): 55.9230306674684 %
Assess the positive controls with the low number of reads detected in the previous steps:
FACE_201 Positive control failed.
Observed number of reads: 3 Expected: 102.161290322581
FACE_202 Positive control failed.
Observed number of reads: 55 Expected: 61.8817204301075
FACE_205 More reads in positive control than in samples on average.
Observed number of reads: 127 Expected: 50.4086021505376
FACE_207 More reads in positive control than in samples on average.
Observed number of reads: 119 Expected: 66.5698924731183
FACE_208 Positive control failed.
Observed number of reads: 75 Expected: 118.075268817204
FACE_205
FACE_207
FACE_208
FACE_201
FACE_202
The above plates have lower than expected number of reads
AND failed positive controls.
THESE PLATES NEED TO BE EXAMINED FURTHER
Low-quality plates are displayed here. All the other plates are
plotted in the last part of this report.
Green squares:
controls [any kind]
Positive control as contamination source
NOTE: All sample and sequence IDs match - data successfully merged
Positive control OTU is TAX:1287025
Non-positve control samples that contain positive control reads:
| Sample | Control Sequence Count | Sequence Similarity | Sequence Type | UMI Plate ID |
|---|---|---|---|---|
| BAYS_002_F11 | 1 | 99.84779 | secondary | 9 |
| BAYS_015_B6 | 1 | 99.84825 | secondary | 13 |
| BAYS_019_E8 | 1 | 99.84871 | secondary | 14 |
| LFLA_003_G8 | 1 | 99.84848 | secondary | 19 |
| LFLA_012_A5 | 1 | 98.31547 | secondary | 15 |
| RRHP_026_A4 | 1 | 99.84802 | secondary | 4 |
| RRHP_027_D8 | 1 | 100.00000 | primary | 5 |
| WTPV_020_H2 | 1 | 99.69697 | secondary | 17 |
Number of samples with positive control OTU as primary sequence: 1
Number of samples with positive control OTU as secondary sequence: 7
out of 5838 samples with secondary sequences
Location of the contaminants relative to the source:
Orange square: positive contros
Green squares: samples with positive
control contamination
Read count mean of all secondary sequences in all samples: 4.96586979401594
Read count mean of all positive control sequences in other samples: 1
Read count median of all secondary sequences in all samples: 1
Read count median of all positive control sequences in other samples: 1
Blue solid line: secondary hit read mean
Orange dotted lines:
mean of reads found as secondary contaminants from the positive controls
in other samples
Both lines should be in close proximity meaning
that the secondary contamination from positive controls is comparable to
the potential contamination in other samples.
NOTE: Non-control samples with control reads recognised as the primary hit need to be examined further!
| Sample | Count | OTU | Sequence |
|---|---|---|---|
| RRHP_027_D8 | 1 | TAX:1287025 | primary |
Negative control contamination
Distribution
of reads in negative controls
NOTE: contamination source can be either primary or secondary sequence within samples!
| Family | No. Source Samples |
|---|---|
| Chironomidae | 73 |
| Tachinidae | 51 |
| Hominidae | 9 |
| Culicidae | 7 |
| Anthomyiidae | 5 |
| Dolichopodidae | 3 |
| Platygastridae | 2 |
| Agromyzidae | 1 |
| Aleyrodidae | 1 |
| Aphididae | 1 |
| Cecidomyiidae | 1 |
| None | 1 |
| Scelionidae | 1 |
| Tipulidae | 1 |
Outline: negative controls with contaminants
Colour of the
oultine indicates partners to track the samples between partner and UMI
plates.
Thicker chartreuse outline: FAILED negative controls with
contaminants [2%]
Numbers indicate the read count
Squares that
are not outlined represent potential sources of contamination within
plates: identical sequences found within these wells and negative
controls.
NOTE: Controls are not included!
Number of wells with a primary sequence only: 2327
Number of wells with primary and secondary sequences: 5794
Number of primary chimeric sequences: 56
Number of secondary chimeric sequences: 7600
NOTE: All secondary chimeric sequences successfully removed
[1] 6
Number of samples with only primary chimeric sequence recognised: 28
We do not know how mBRAVE recognises chimeras - for now ony samples represented by less than 5 reads get removed
Retained samples: 6
Number of EXCLUDED primary sequences: 499
which constitutes 6.16125447586122 % of all samples
These samples are not being removed - it's an mBRAVE cut-off
Number of primary sequences with no taxonomy assigned: 608
which constitutes 7.5070996419311 % of all samples
These samples are going to be examined further
Number of samples with no taxonomy assigned that will be replaced with the secondary sequence based on the sequence similarity: 150
Other sequences with no taxonomy assigned to the primary sequence will remain unchanged.
If the first entry is not ‘Arthropod’, then the second entry is likely correct [based on manual observations]
Number of samples with Wolbachia detected: 335
Table with plate positions, number of reads, and sequences saved to the output directory:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/
Number of samples with Nematoda, Tardigrada, Annelida, and/or Rotifera detected: 80
Table with plate positions, number of reads, and sequences saved to the output directory:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/
| Taxon | Frequency |
|---|---|
| Chordata | 20 |
| Nematoda | 9 |
| Proteobacteria | 54 |
| Rotifera | 2 |
85 wells had primary non-Arthropod hits and secondary Arthropod hits
NOTE: Primary hits are going to be replaced
Samples with only non-Arthropod sequences detected: 29
NOTE: These samples have been excluded!
Number of African Anopheline samples: 50
Number of primary African Anopheline hits [250 or more reads]: 24
NOTE: All primary mosquito samples removed!
Number of samples with only primary Arthropod sequence: 5779
73.1982267257758 % of all remaining samples
Number of samples where secondary sequence is not present elsewhere on the partner or UMI plate: 0
Number of conflicting sequences [sequences are in different families or orders, both have good read support]: 231
| Primary hit | Number |
|---|---|
| Arthropoda | 7210 |
| None | 531 |
Number of retained samples: 7741
Number of Arthropod samples assigned by mBRAVE [this inscludes samples with 5 or less reads that have now been excluded!]: 7376
Number of samples with replaced sequences: 34
Retained chimeras: 20
Retained samples with no taxonomy: 531
Each retreived sample has only one sequence: TRUE
| Number of samples | Description | Category | Decision |
|---|---|---|---|
| 4402 | Only one sequence with more than 200 reads, no secondary sequence detected | 1 | YES |
| 863 | Only one sequence with 50 to 200 reads, no secondary sequence detected | 2 | YES |
| 482 | Only one sequence with 5 or more but less than 50 reads, no secondary sequence detected | 3 | YES |
| 123 | Dominant sequence with more than 200 reads, non-conflicting secondary sequences with 5 or less reads | 4 | YES |
| 48 | Dominant sequence with 50 to 200 reads, non-conflicting secondary sequences with 5 or less reads | 5 | YES |
| 671 | Dominant sequence with more than 200 reads, conflicting secondary sequences with 5 or less reads | 6 | YES |
| 376 | Dominant sequence with 50 to 200 reads, conflicting secondary sequences with 5 or less read | 7 | YES |
| 279 | Dominant sequence with more than 200 reads, secondary sequences with more than 5 read support | 8 | NO |
| 177 | Dominant sequence with 50 to 200 reads, secondary sequences with more than 5 read support | 9 | NO |
| 251 | Dominant sequence with 5 or more but less than 50 reads, non-conflicting secondary sequences with less than 5 reads | 10 | NO |
| 69 | Dominant sequence with more than 5 but less than 50 reads, any other secondary reads present | 11 | NO |
| Decision category | Number of samples |
|---|---|
| NO | 776 |
| YES | 6965 |
8.91869631721379 % OF SAMPLES EXCLUDED [all samples]
18.0491822567361 % OF SAMPLES EXCLUDED [only approved samples]
NOTE: The heatmaps below show only the retained samples.
Controls, chimeric samples, non-Arthropod samples, and samples with no
taxonomy assigned have been removed or replaced!
Final fasta file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/BOLD_filtered_sequences_batch18.fasta
Final metadata file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/BOLDfiltered_metadata_batch18.csv
The report and output files have been successfully
generated!
| Plate | Original number of samples | Number of samples after filtering | Percentage |
|---|---|---|---|
| WTPV_018 | 93 | 93 | 100.000000 |
| RRHP_063 | 14 | 14 | 100.000000 |
| WTPV_025 | 49 | 48 | 97.959184 |
| BAYS_018 | 93 | 91 | 97.849462 |
| FACE_011 | 93 | 91 | 97.849462 |
| RRHP_045 | 93 | 91 | 97.849462 |
| WTPV_016 | 93 | 91 | 97.849462 |
| WTPV_019 | 93 | 91 | 97.849462 |
| RRHP_061 | 93 | 90 | 96.774193 |
| FACE_009 | 93 | 89 | 95.698925 |
| FACE_010 | 93 | 89 | 95.698925 |
| LFLA_010 | 93 | 89 | 95.698925 |
| WTPV_020 | 93 | 89 | 95.698925 |
| BAYS_022 | 93 | 88 | 94.623656 |
| WTPV_013 | 93 | 88 | 94.623656 |
| WTPV_015 | 93 | 88 | 94.623656 |
| LFLA_014 | 18 | 17 | 94.444444 |
| BAYS_003 | 93 | 87 | 93.548387 |
| WTPV_014 | 93 | 87 | 93.548387 |
| WTPV_022 | 93 | 87 | 93.548387 |
| LFLA_002 | 31 | 29 | 93.548387 |
| CAMP_025 | 75 | 70 | 93.333333 |
| BAYS_014 | 93 | 86 | 92.473118 |
| CAMP_036 | 93 | 86 | 92.473118 |
| RRHP_029 | 93 | 86 | 92.473118 |
| RRHP_048 | 53 | 49 | 92.452830 |
| FACE_006 | 93 | 85 | 91.397850 |
| WTPV_021 | 93 | 85 | 91.397850 |
| WTPV_024 | 93 | 85 | 91.397850 |
| CAMP_029 | 93 | 84 | 90.322581 |
| CAMP_034 | 93 | 84 | 90.322581 |
| CAMP_035 | 93 | 84 | 90.322581 |
| FACE_012 | 93 | 84 | 90.322581 |
| FACE_014 | 93 | 84 | 90.322581 |
| LFLA_001 | 93 | 84 | 90.322581 |
| RRHP_046 | 93 | 84 | 90.322581 |
| RRHP_062 | 93 | 84 | 90.322581 |
| BAYS_002 | 93 | 83 | 89.247312 |
| BAYS_015 | 93 | 83 | 89.247312 |
| BAYS_019 | 93 | 83 | 89.247312 |
| CAMP_031 | 93 | 83 | 89.247312 |
| FACE_007 | 93 | 83 | 89.247312 |
| FACE_008 | 93 | 83 | 89.247312 |
| FACE_013 | 93 | 83 | 89.247312 |
| RRHP_047 | 93 | 83 | 89.247312 |
| WTPV_017 | 93 | 83 | 89.247312 |
| BAYS_020 | 93 | 82 | 88.172043 |
| FACE_213 | 93 | 82 | 88.172043 |
| BAYS_001 | 93 | 81 | 87.096774 |
| CAMP_030 | 93 | 81 | 87.096774 |
| BAYS_011 | 93 | 80 | 86.021505 |
| CAMP_033 | 93 | 80 | 86.021505 |
| LFLA_003 | 93 | 80 | 86.021505 |
| CAMP_024 | 74 | 63 | 85.135135 |
| BAYS_008 | 93 | 79 | 84.946237 |
| WTPV_023 | 93 | 79 | 84.946237 |
| BAYS_017 | 93 | 78 | 83.870968 |
| LFLA_007 | 93 | 78 | 83.870968 |
| BAYS_010 | 93 | 77 | 82.795699 |
| CAMP_032 | 93 | 77 | 82.795699 |
| CAMP_037 | 93 | 77 | 82.795699 |
| FACE_018 | 93 | 77 | 82.795699 |
| LFLA_008 | 93 | 77 | 82.795699 |
| LFLA_004 | 93 | 76 | 81.720430 |
| BAYS_005 | 93 | 75 | 80.645161 |
| FACE_017 | 93 | 75 | 80.645161 |
| BAYS_004 | 93 | 74 | 79.569892 |
| BAYS_016 | 93 | 74 | 79.569892 |
| FACE_016 | 93 | 74 | 79.569892 |
| FACE_210 | 93 | 74 | 79.569892 |
| LFLA_050 | 93 | 74 | 79.569892 |
| FACE_015 | 93 | 73 | 78.494624 |
| BAYS_013 | 93 | 72 | 77.419355 |
| LFLA_005 | 93 | 72 | 77.419355 |
| LFLA_006 | 93 | 72 | 77.419355 |
| BAYS_009 | 93 | 71 | 76.344086 |
| BAYS_012 | 93 | 70 | 75.268817 |
| FACE_212 | 93 | 69 | 74.193548 |
| BAYS_006 | 93 | 68 | 73.118280 |
| LFLA_009 | 93 | 66 | 70.967742 |
| FACE_208 | 93 | 64 | 68.817204 |
| FACE_020 | 93 | 63 | 67.741935 |
| BAYS_007 | 93 | 62 | 66.666667 |
| FACE_217 | 93 | 62 | 66.666667 |
| FACE_211 | 93 | 61 | 65.591398 |
| FACE_204 | 93 | 58 | 62.365591 |
| FACE_202 | 93 | 56 | 60.215054 |
| FACE_205 | 93 | 55 | 59.139785 |
| FACE_203 | 93 | 54 | 58.064516 |
| FACE_207 | 93 | 48 | 51.612903 |
| FACE_201 | 93 | 47 | 50.537634 |
| FACE_218 | 12 | 6 | 50.000000 |
| LFLA_012 | 93 | 45 | 48.387097 |
| FACE_206 | 93 | 44 | 47.311828 |
| RRHP_026 | 93 | 3 | 3.225807 |
| RRHP_027 | 82 | 2 | 2.439024 |
| Partner | Original number of samples | Number of samples after filtering | Percentage |
|---|---|---|---|
| WTPV | 1165 | 1094 | 93.90558 |
| CAMP | 986 | 869 | 88.13387 |
| BAYS | 1953 | 1644 | 84.17819 |
| LFLA | 1072 | 859 | 80.13060 |
| FACE | 2523 | 1913 | 75.82243 |
| RRHP | 800 | 586 | 73.25000 |
| Plate | Original number of samples | Number of samples after filtering | Percentage |
|---|---|---|---|
| 17 | 372 | 356 | 95.69892 |
| 16 | 372 | 354 | 95.16129 |
| 2 | 372 | 353 | 94.89247 |
| 6 | 332 | 306 | 92.16867 |
| 14 | 372 | 337 | 90.59140 |
| 18 | 372 | 336 | 90.32258 |
| 24 | 372 | 331 | 88.97849 |
| 1 | 353 | 314 | 88.95184 |
| 13 | 372 | 326 | 87.63441 |
| 23 | 372 | 321 | 86.29032 |
| 3 | 372 | 314 | 84.40860 |
| 9 | 354 | 298 | 84.18079 |
| 19 | 328 | 276 | 84.14634 |
| 12 | 372 | 308 | 82.79570 |
| 10 | 372 | 304 | 81.72043 |
| 22 | 291 | 234 | 80.41237 |
| 20 | 372 | 293 | 78.76344 |
| 21 | 372 | 293 | 78.76344 |
| 11 | 372 | 289 | 77.68817 |
| 5 | 361 | 263 | 72.85319 |
| 15 | 235 | 165 | 70.21277 |
| 4 | 372 | 218 | 58.60215 |
| 7 | 293 | 171 | 58.36177 |
| 8 | 372 | 205 | 55.10753 |
The plates with low number of reads and retained samples should be examined!
Failed negative controls [2%] with contamination other than Bovidae:
CONTROL_NEG_LYSATE_BAYS_005_H12
CONTROL_NEG_LYSATE_BAYS_007_H12
CONTROL_NEG_LYSATE_BAYS_008_H12
CONTROL_NEG_LYSATE_BAYS_010_H12
CONTROL_NEG_LYSATE_CAMP_025_B1
CONTROL_NEG_LYSATE_CAMP_025_B2
CONTROL_NEG_LYSATE_FACE_203_H12
CONTROL_NEG_LYSATE_FACE_218_A8
CONTROL_NEG_LYSATE_FACE_218_B7
CONTROL_NEG_LYSATE_FACE_218_E8
CONTROL_NEG_LYSATE_FACE_218_F8
CONTROL_NEG_LYSATE_RRHP_027_F11
CONTROL_NEG_LYSATE_WTPV_017_H12
These samples may have insects in them!